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Abstract 

Combining type theory, language design, and empirical work, 
we present techniques for computing with large and dynamically 
changing datasets. Based on lambda calculus, our techniques are 
suitable for expressing a diverse set of algorithms on large datasets 
and, via self-adjusting computation, enable computations to respond 
automatically to changes in their data. To improve the scalability 
of self-adjusting computation, we present a type system for precise 
dependency tracking that minimizes the time and space for storing 
dependency metadata. The type system eliminates an important 
assumption of prior work that can lead to recording spurious 
dependencies. We present a type-directed translation algorithm that 
generates correct self-adjusting programs without relying on this 
assumption. We then show a probabilistic-chunking technique to 
further decrease space usage by controlling the fundamental space- 
time tradeoff in self-adjusting computation. We implement and 
evaluate these techniques, showing promising results on challenging 
benchmarks involving large graphs. 

Categories and Subject Descriptors D.l.l [Programming Tech- 
niques]: Applicative (Functional) Programming; F.3.3 [Logics and 
Meanings of Programs]: Studies of Program Constructs 

Keywords Self-adjusting computation; information-flow type sys- 
tem; granularity control; incremental graph algorithms; performance 

1. Introduction 

Recent advances in the ability to collect, store, and process large 
amounts of information, often represented in the form of graphs, 
have led to a plethora of research on "big data." In addition to 
being large, such datasets are diverse, arising in many domains 
ranging from scientific applications to social networks, and dynamic, 
meaning they change gradually over time. For example, a social- 
network graph changes as users join and leave, or as they change 
their set of friends. Prior research on languages and programming 
systems for big-data applications has two important limitations: 
• Due to their diversity, big-data applications benefit from expres- 
sive programming languages. Yet existing work offers domain- 
specific languages and systems such as "MapReduce" 1201 with 
limited expressiveness that not only restrict the set of problems 
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that can be solved but also how efficiently they can be solved 

(by limiting the algorithms that can be implemented). 
• Even though big data applications often require operating on 

dynamically changing datasets, many existing languages and 

systems provide for a batch model of computation, where the 

data is assumed to be static or unchanging. 

In this paper, we show that, when combined with the right 
set of techniques, functional programming can help overcome 
both of these limitations. First, as an expressive, general purpose 
programming model, functional programming enables efficient 
implementations of a broad range of algorithms for big data. Second, 
since functional programming is consistent with self-adjusting 
computation [1,6, 15-17], it can also enable programs to respond 
efficiently to changing data provided that a major limitation of self- 
adjusting computation — space usage — can be overcome. 

Self-adjusting computation 1 1 6 15 -17] refers to a technique 
for compiling batch programs into programs that can automatically 
respond to changes to their data. The idea behind self-adjusting 
computation is to establish a space-time tradeoff so that the results 
of prior computations can be reused when computing the result for a 
different but similar input. Self-adjusting computation achieves this 
by representing the execution of a program as a higher-order graph 
data structure called a dynamic dependency graph, which records 
certain dependencies in the computation, and by using a change- 
propagation algorithm to update this graph and the computation. 
In a nutshell, change propagation identifies and rebuilds (via re- 
execution) only the parts of the computation that are affected by 
the changes. Unfortunately, existing approaches to self-adjusting 
computation require a significant amount of memory to store the 
dynamic dependency graph. For example, on a modest input of 10 7 
integers, a self-adjusting version of merge sort uses approximately 
lOOx more space than its batch counterpart. Such demands for space 
can limit its applicability to relatively small problem sizes. 

This paper presents two techniques for improving space usage 
of self-adjusting computation. The first technique reduces space 
usage by improving the precision of dependency tracking that self- 
adjusting computation relies on. The second technique enables 
programmers to control the space-time tradeoff fundamental to self- 
adjusting computation. Our first technique relies on a type system 
for precisely tracking dependencies and a type-directed translation 
algorithm that can generate correct and efficient self-adjusting 
programs. Our second technique is a probabilistic chunking scheme 
for coarsening the granularity at which dependencies are tracked 
without disproportionately degrading the update performance. 

Our starting point is the recent work on type-based automatic 
incrementalization 1 1544171 ■ That work enables translating batch 
programs into self-adjusting programs that can efficiently respond 
to incremental changes. The idea behind the approach is to use 
a type inference algorithm to infer all changeable data, which 
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change over time, and track only their dependencies. Unfortunately, 
the type inference algorithm can identify non-changeable data as 
changeable, causing redundant dependencies to be recorded. The 
reason for this is the modal type system that all previous work on 
self-adjusting computation relies on |6|. That modal type system 
ensures a crucial property, that all relevant dependencies are tracked, 
but at the cost of being conservative and disallowing changeable 
data from being nested inside non-changeable data, which leads to 
redundant dependencies. 

We solve this problem by designing a more refined type systems 
(Section|3| and a translation algorithm (Section|6]l that can correctly 
translate source programs (Section |4]l into a lower-level target 
language (Section|5]l. Our source-level type system is an information- 
flow type system that enables precise dependency tracking. This 
type system break an important assumption of prior work, allowing 
changeable data to be nested inside non-changeable data. We 
present a translation algorithm that nevertheless produces correct 
self-adjusting executables by emitting target code written in a 
destination-passing style. To provide the flexibility needed for 
operation on changeables without creating redundant dependencies, 
the target language is imperative but relies on a type and effect 
system |52| for correctness, guaranteeing that all dependencies 
are tracked. We prove that the translation generates well-typed 
and sound target code, consistent with the source typing, and thus 
guarantees the correctness of the resulting self-adjusting code in the 
target language. 

When combined with an important facility in self-adjusting 
computation — the ability to control the granularity of dependency 
tracking by selectively tracking dependencies — precise dependency 
tracking offers a powerful mechanism to control the space-time 
tradeoff fundamental to self-adjusting computation. By tracking 
dependencies at the level of (large) blocks of data, rather than 
individual data items, the programmer can further reduce space 
consumption. As we describe, however, this straightforward idea 
can lead to disproportionately slow updates (Section]?}, because it 
can cause a small change to propagate to many blocks. We overcome 
this problem by presenting a probabilistic blocking technique. 
This technique divides the data into blocks in a probabilistic way, 
ensuring that small changes affect a small number of blocks. 

We implement the proposed techniques in Standard ML and 
present an empirical evaluation by considering several list primitives, 
sorting algorithms, and more challenging algorithms for large graphs 
such as PageRank, graph connectivity, and approximate social- 
circle size. These problems, which are highly unstructured, put 
our techniques through a serious test. Our empirical evaluation leads 
to the following conclusions. 

• Expressive languages such as lambda calculus augmented with 
simple type annotations, instead of domain-specific languages 
such as MapReduce, can lead to large (e.g. 50-100x) improve- 
ments in time and space efficiency. 

• The type system for precise dependency tracking can signifi- 
cantly reduce space and time requirements (e.g., approximately 
by 2x and 6x respectively for MapReduce applications). 

• Our techniques for controlling the space-time tradeoff for list 
data structures can reduce memory consumption effectively 
while only proportionally slowing down updates. 

• Our techniques can enable responding significantly faster (e.g., 
several orders of magnitude or more) to both small and aggregate 
changes while moderately increasing memory usage compared 
to the familiar batch model of computation. 



2. Background and Overview 

Using a simple list-partitioning function, we illustrate the self- 
adjusting computation framework, outline two limitations of previ- 
ous approaches, and describe how we resolve them. 

2.1 Background and List Partition 

Figure [T] shows SML code for a list-partition function partition 
f 1, which applies / to each element x of /, from left to right, and 
returns a pair (pos, neg) where pos is the list of elements for which / 
evaluated to true, and neg is the list of those for which / x evaluated 
to false. The elements of pos and neg retain the same relative order 
from /. Ignoring the annotation C, this is the same function from the 
SML basis library, which takes &(n) time for a list of size n. 

Self-adjusting computation enables the programmer to develop 
efficient incremental programs by annotating the code for the non- 
incremental or batch programs. The key language construct is a 
modifiable (reference), which stores a changeable value that may 
change over time |6|. The runtime system of a self-adjusting lan- 
guage track dependencies on modifiables in a dynamic dependency 
graph, enabling efficient change propagation when the data changes 
in small amounts. 

Developing a self-adjusting program can involve significant 
changes to the batch program. Recent work H5tfT71 l proposes a type- 
directed approach for automatically deriving self-adjusting programs 
via simple type annotations. For example, given the code in the 
leftmost column of Figure[T|and the annotation C (the second line) 
that marks the tail of the list changeable, the compiler automatically 
derives the code in the middle figure. 

These type annotations, broadly referred to as level types, parti- 
tion all data types into stable and changeable levels. Programmers 
only need to annotate the types of changeable data with C; all other 
types remain stable, meaning they cannot be changed later on. For 
example, int s is a stable integer, int c is a changeable integer and 
int s list c is a changeable list of stable integers. This list allows 
insertion and deletion but each individual element cannot be altered. 

In the translated code (Figure^ middle), changeable data are 
stored in modifiables: a changeable int becomes an int mod. 
Given the self-adjusting list-partition function, we can run it in 
much the same way as running the batch version. After a complete 
first run, we can change any or all of the changeable data and 
update the output by performing change propagation. As an example, 
consider inserting an element into the input list and performing 
change propagation. This will trigger the execution of computation 
on the newly inserted elements without recomputing the whole list. 
It is straightforward to show that change propagation takes 0(1) 
time for a single insertion. 

2.2 Limitation 1: Redundant Dependencies 

The problem. As with all other prior work on self-adjusting com- 
putation (e.g. [1|) that relies on a type system to eliminate diffi- 
cult correctness problem (in change propagation), recent work H 1 54 — 
"P71 uses a modal type system to guarantee properties important 
to the correctness of self-adjusting computation — all changeables 
are initialized and all their dependencies are tracked. This type 
system can be conservative and disallow changeable data to be 
nested inside changeable data. For example, in list partition, the type 
system forces the return the type to be changeable, i.e., the type 
(a list mod * a list mod) mod. This type is conservative; 
the outer modifiable (mod) is unnecessary as any observable change 
can be performed without it. By requiring the outer modifiable, the 
type system causes redundant dependencies to be recorded. In this 
simple example, this can nearly double the space usage while also 
degrading performance (likely as much as an order of magnitude). 
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fun partition £ 1 

: (a list c * a list c ) 

case 1 of 

nil => (nil, nil) 



h: 



let val (a,b) = partition f t 

in if f h then (h : : a , b) 
else (a, h: :b) 

end 



fun partition f 1 

: (a list mod * a list mod) mod = 
read 1 as 1' in 
case 1' of 

nil => write (mod (write nil) , 
mod (write nil)) 

I h: :t => 
let val pair = mod (partition f t) 



in if £ h then read pair as (a,b) in 
write (mod (write h::a), b) 
else read pair as (a,b) in 
write (a, mod (write h::b)) 

end 



fun partition f 1 (l 0 o,loi) 

: (a list mod * a list mod) = 
let val () = read 1 as 1' in 
case 1' of 

nil => (write (loo, nil); 

write (loi ,nil)) 

I h: :t => 
let val (a,b) = let 

val (loo.loi) = (mod nil, mod nil) 

in partition f t (loo.loi) 

end 

in if f h then (write (loo, h::a); 

read b as b' in write (loi , b')) 
else (read a as a' in write (loo. a'); 
write (loi , h: :b)) 

end 

in (loo, loi) end 



Figure 1. The list partition function: ordinary (left), self-adjusting (center), and with destination passing (right). 



Our solution. We can circumvent this problem by using unsafe, 
imperative operations. For our running example, partition can 
be rewritten as shown in Figure [TJnght), m a destination passing 
style. The code takes an input list and two destinations, which are 
recorded separately. Without restrictions of the modal type system, 
it can return (a- list mod * a list mod) , as desired. 

A major problem with this approach, however, is correctness: 
a simple mistake in using the imperative constructs can lead to 
errors in change propagation that are extremely difficult to identify. 
We therefore would like to derive the efficient, imperative version 
automatically from its purely functional version. There are three 
main challenges to such translation. (1) The source language has to 
identify which data is written to which part of the aggregate data 
types. (2) All changeable data should be placed into modifiables and 
all their dependencies should be tracked. (3) The target language 
must verify that self-adjusting constructs are used correctly to ensure 
correctness of change propagation. 

To address the first challenge, we enrich an information-flow 
type to check dependencies among different components of the 
changeable pairs. We introduce labels p into the changeable level 
annotations, denoted as C p . The label serves as an identifier for 
modifiables. For each function of type t\ — > t%, we give labels for 
the return type r 2 . The information flow type system then infers 
the dependencies for each label in the function body. These labels 
decide which data goes into which modifiable in the translated code. 

To address the second challenge, the translation algorithm takes 
the inferred labels from the source program, and conducts a type di- 
rected translation to generate self-adjusting programs in destination 
passing style. Specifically, the labels in the function return type are 
translated into destinations (modifiables) in the target language, and 
expressions that have labeled level types are translated into explicit 
write into their corresponding modifiables. Finally, we wrap the 
destinations into the appropriate type and return the value. 

As an example, consider how we derive the imperative self- 
adjusting program for list partition, starting from the purely func- 
tional implementation on the leftmost column of Figure [T] First, 
we mark the return type of the partition function as (a list 00 * 
a list Col ) s , which indicates the return has two destinations l Q0 and 
Zoi, and the translated function will take, besides the original argu- 
ments / and /, two modifiables / 0 o and 'oi as arguments. Then an 
information flow type system infers that the expression (h : : a , b) 
on line 12 of Figurelll(left) has type (a list c °° *a list c °' ) s . Using 
these label information, the compiler generates a target expression 
write (l 0 o,h: :a) ; write (loi >b). Finally, the translated func- 
tion returns the destination as a pair (loo, loi) - Figure [TJ (right) 
shows the translated code for list partition using our translation. 



To address the third challenge, we design a new type system 
for the imperative target language. The type system distinguishes 
the modifiable as fresh modifiables and finalized modifiables. The 
typing rules enforce that all modifiables are finalized before reading, 
and the function fills in all the destinations, no matter which control 
branch the program is taken. We further prove that following 
the translation rules, we generate target programs that are of the 
appropriate type, and are type safe. 

2.3 Limitation 2: Dependency Metadata. 

The problem. Even with precise dependency tracking, self- 
adjusting programs can require large amounts of memory, making 
them difficult to scale to large inputs. One culprit is the dynamic 
dependency graph that stores operations on modifiables. For ex- 
ample, the list partition function contains about n read operations. 
Our experiments show, for example, that self-adjusting list partition 
requires 41x more memory than its batch counterpart. In principle, 
there is a way around this: simply treat blocks of data as a change- 
able unit instead of treating each unit as a changeable. However, 
it turns out to be difficult to make this work because doing so can 
disproportionately degrade performance. 

At a very high level, self-adjusting computation may be seen as 
a technique for establishing a trade-off between space and time. By 
storing the dependency metadata, the technique enables responding 
to small changes to data significantly faster by identifying and 
recomputing only the parts of the computation affected by the 
changes. It is natural to wonder whether it would be possible to 
control this trade-off so that, for example, a 1/B-th fraction (for 
some B) of the dependency metadata is stored at the expense of an 
increased update time, hopefully by no more than a factor of B. 

Our solution. To see how we might solve this problem, consider 
the following simple idea: partition the data into equal-sized blocks 
and treat each of these blocks as a unit of changeable computation at 
which dependencies are tracked. This intuitive idea is indeed simple 
and natural to implement. But there is a fundamental problem: fixed- 
size chunking is highly sensitive to small changes to the input. As 
a simple example, consider inserting or deleting a single element 
to a list of blocks. Such a change will cascade to all blocks in the 
list, preventing much of the prior computation from being reused. 
Even "in-place" changes, which the reader may feel would not cause 
this problem, are in fact unacceptable because they do not compose. 
Consider, for example, the output to the filter function, which 
takes an input list and outputs only elements for which a certain 
predicate evaluates to true. Modifying an input element in-place 
may drop or add an element to the output list, which can create a 
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Levels 6 
Types t 
Constraints C, D 



- int* 5 | (T i XT 2 f | (Tl + T 2 f | (Ti — > t 2 ) s 

■ true | false | a = B \ a < B | 

6<r\p\ =p 2 



Figure 2. Levels, types and constraints 

ripple effect to all the blocks. The main challenge in these examples 
lies in making sure the blocks remain stable under changes. 

We solve these problems by eliminating the intrinsic dependency 
between block boundaries and the data itself. More precisely, we 
propose a probabilistic chunking scheme that decides block bound- 
aries using a (random) hash function independently of the structure 
of the data rather than deterministically. Using this technique, we 
are able to reduce size of the dependency metadata by a factor B 
in expectation by chunking the data into blocks of expected size B 
while taking only about a factor of B hit in the update time. 

3. Fine-grained Information Flow Types 

In this section, we derive a type system for self-adjusting compu- 
tation that can identify precisely which part of the data, down to 
individual attributes of a record or tuple, is changeable. In particu- 
lar, we extend the surface type system from previous work to track 
fine-grained dependencies in the surface language. 

The formalism rests on a simple insight that data that depends 
on changeable data must itself be changeable, similar to situations 
in information-flow type systems, where "secret" (high-security) 
data is infectious; therefore, any data that depends on secret data 
itself must be secret. 

To track dependency precisely, we distinguish different change- 
able data further by giving them unique labels. Our types include 
a lattice of (security) levels: stable and changeable with labels. We 
generally follow the approach and notation of Chen et al. II15II17I 
except that we need not have a mode on function types. 

Levels. Levels S (stable) and C p (changeable) have a partial order: 



< S 



c„<c D 



Stable levels are lower than changeable; changeable levels with 
different labels are generally incomparable. Here, labels are used to 
distinguish different changeable data in the program. We also assume 
that labels with prefix 1 are lower than labels with prefix 0. This 
allows changeable data to flow into their corresponding destinations 
(labeled with prefix 0). We will discuss the subsumption in Section]?] 

Types. Types consist of integers tagged with their levels, prod- 
ucts, sums and arrow (function) types with an associated level, as 
shown in Figure [2] The label p associated with each changeable 
level denotes fine-grained dependencies among changeables: two 
changeables with the same label have a dependency between them. 

Labels. Labels are identifiers for changeable data. To facilitate 
translation into a destination-passing style, we use particular binary- 
encoded labels that identify each label with its destination. This 
binary encoding works in concert with the relation r l p D,-L, 
in Figure[5] which recursively determines the labels with respect to a 
prefix p, where the type of the destinations and the destination names 
are stored in D and £., respectively. For stable product, rule (#prodS), 
we label it based on the structure of the product. Specifically, we 
append 0 if the changeable level is on the left part of a product, and 
we append 1 if the changeable level is on the right part of a product. 
For changeable level types, we require that the outer level label is 
p. The relation does not restrict the inner labels. For stable level 
integers, sums and arrows, we do not look into the type structure, 
the inner changeable types can be labeled arbitrarily. As an example, 



inr l p 0; 0 



(#intS) 



int c "| P {int};j/ p 



(#intC) 



(#sumS) 



(T 1+ T 2 n P 0;0 ' (Ti^T 2 )»i P 0;0 
tUpqDU T 2 l pl £>';£' 



(#funS) 



(TiXT 2 ) s LflUfl';Xur 



(T,XT 2 )*-Upl(T,XT 2 )| ; |y 



(Tj +T 2 f" L {(n +T 2 ));j/„) 



(t, ^T 2 f" lp {(ti -» r 2 ));{U 



(#prodS) 



(#prodC) 



(#sumC) 



(#funC) 



Figure 3. Labeling changeable types 



inf 



(Ti xr,)' 



|(Tl + T 2 f 

(t, -» T 2 y 



int" 



(Tl X T 2 f> = (T, X T 2 f* 



(Tj + T 2 f = (T, + T 2 P 
(Tl -> T 2 P = (T, -» T 2 fl 



Figure 6. Outer level of types, and equality up to outer levels 
Values v : := n \ x | (vi, v%) | inl v | inr v | fun f(x) = e 

Expr.'s e : := v | ®(xi , x 2 ) | fst x | snd x | 

case x of [x\ => e\ , x 2 => e 2 ] | 
apply(xi , x 2 ) | let x = e\ in e 2 

Figure 7. Abstract syntax of the source language 



t = (int c °° x (int c >" +int s ) Col j is a valid label for r J, 0 D;£. 

The type for the destinations are D = jint, (int MI1 + hit")), and the 
destination names are £, = j/ 0 o, ^oi )• 

Subtyping. Figure|4]shows the subtyping relation t <: r', which 
is standard except for the levels. It requires that the outer level of 
the subtype is smaller than the outer level of the supertype. 

Levels and types. We need relations between levels and types to 
ensure certain invariants. A type t is higher than d, written S < t, if 
the outer level of the type is at least 5. In other words, 5 is a lower 
bound of the outer level of t. For products with outer stable levels, 
we check if each component is higher than 6. Note that we do not 
check the component of a stable sum type. Figure [5] defines this 
relation. 

We define an outer-level operation [tTJ that derives the outer 
level of a type in Figure[6]l. Finally, two types Ti and r 2 are equal up 
to their outer levels, written T[ = t 2 , if Tj = t 2 or they differ only in 
their outer levels. 



4. Source Language 

Abstract syntax. Figure [7] shows the syntax for our source lan- 
guage, a purely functional language with integers (as base types), 
products, and sums. The expressions consist of values (integers, 
pairs, tagged values, and recursive functions), projections, case ex- 
pressions, function applications, and let bindings. For convenience, 
we consider only expressions in A-normal form, which names in- 
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6<6' 



int" <: int" 



— (sublnt) 



n <: t[ t 2 <: t' 2 S < 6' 
(ti X r 2 f <: (r'[ X r'^f 



(subProd) 



Ti <: r. 



6<S' 



(ti + r 2 )' 5 <: (tI +r0 



(subSum) 



<5 < <5' 



t 2 <: r 2 



(ti -> T2) 15 <: (t! 



r' 2 f 



(subArr) 



Figure 4. Subtyping 



6<6' 



6 < int 4 



- (<-Int) 



5 < 8' 6 < T| 

-7 (<-Prod) 



5 < (Ti X T 2 ) 



6 < T 2 



6<6' 



6 < (t, X T 2 ) 



(<-InnerProd) 



<5 < (n -> r 2 f 



— (<-Arrow) 



6<6' 



5<{T l + T 2 f 



— (<-Sum) 



Figure 5. Lower bound of a type 



C;f;Th e : r 



Under constraint C, label set 
and source typing environment T, 
source expression e has type r 



C;f;r h n : int s 
C;P;ri- vi : t, 



— (SInt) 



r(*): 



C;P;rn:r 
C;f;r h v 2 : t 2 



(SVar) 



(SPair) 

C;P;rh(v!,v 2 ):(T,XTj 
C;<P;T h v : ti s C; P; r h * : (ti x r 2 f 



(SSum) 



C; P; r h inl v : (r, + r 2 f C; P; T h fst ;r : ti 

C;{lpY,T,x : n,f : (n -> t 2 ) s h e : r 2 

lTi]=Ci p ClhT 2 |oO;-C 



(SFst) 



C;f";rh(fun f(x) = e) : (t. 



T 2 ) S 



(SFun) 



C;P;Fh xi : int" 1 

C; f; T h x 2 : int* 2 C «- 5] 



int x int 



int 



C;^;r h ffi(xi,x 2 ) : int d 



C;P;ri-ei : t' 



C lh t' <: r" 



C;PU {pl;r,i : r" h e 2 : t C lh r' = r" Cu-ptP 
C;P;T h let x = ej in e 2 : r 
C;P;n-Xi :(t, -» t 2 )* 
C;P;r h x 2 : ti C lh 6 < t 2 

(SApp) 



(SPrim) 



(SLet) 



C^jrhapplyCx,^,):^ 

C^iT^z : t 2 h e 2 : t 



C-^Tb x : (ti +r 2 )' 

C \<r 5 <T 



(SCase) 



T h case xof jx! => e! , x 2 => e 2 ) : 

Figure 8. Typing rules for source language 

termediate results. A-normal form simplifies some technical issues, 
while maintaining expressiveness. 

Constraint-based type system. The type system has the fine- 
grained level-decorated types and constraints (Figure [2} as was 
described in Section|3] After discussing the rules themselves, we 
will look at type inference. 

The typing judgment C; V; T h e : t has a constraint C, a label 
set V (storing used label names) and typing environment T, and 
infers type r for expression e. Our work extends the type system in 
Chen et al. [15, 17 1 with labels. Although most of the typing rules 
remain the same, there are two major differences: (1) The source 
typing judgment no longer has a mode; (2) Our generalization has a 
label set in the typing rules to make sure the labels inside a function 
are unique. Furthermore, our generalization of changeable levels 
with labels does not affect inferring level polymorphic types. To 



simplify the presentation, we assume the source language presented 
here is level monomorphic. 

The typing rules for variables (SVar), integers (SInt), pairs 
(SPair), sums (SSum), primitive operations (SPrim), and projections 
(SFst) are standard. (We omit the symmetric rules for inr v and 
snd x.) To type a function (SFun), we type the body specified by the 
function type (t! — > r 2 f . The changeable types in the return type 
will translate to destinations when translating in the target language. 
To facilitate the translation, we need to fix the destination labels in 
the return type via t 2 | 0 O; -C, where we assume destination labels 
all have prefix 0. We also assume that non-destination labels, e.g. 
labels for changeable input, have prefix 1. Note that these labels are 
only in a function scope, labels in different functions do not need to 
be unique. We omit the simpler rule for [tj] = S. 

Like in previous work, we allow subsumption only at let bind- 
ing (SLet), e.g. from a bound expression e\ of subtype int a to an 
assumption x : int p . Note that when binding an expression into a 
variable with a changeable level, the label p must be either unique or 
one of the labels from the destination. The subtype allows change- 
able labels with prefix 1 to be "promoted" as labels with prefix 0. 
This restriction makes sure the input data can flow to destinations, 
and the information flow type system tracks dependency correctly. 
We omit the simpler rule for Jt"]| = S. As in previous work, we 
restrict that we subsume only when the subtype and supertype are 
equal up to their outer levels. This simplifies the translation, with 
no loss of expressiveness: to handle "deep" subsumption, such as 
(int — > int ) <: (int s — > int Cp ) c ^, we can insert coercions into 
the source program before typing it with these rules. (This process 
could easily be automated.) 

A function application (SApp) requires that the result of the 
function must be higher than the function's level: if a function 
is itself changeable (t\ — > T 2 ) Cp , then it could be replaced by 
another function and thus the result of this application must be 
changeable. Due to let-subsumption, checking this in (SFun) alone 
is not enough. Similarly, in rule (SCase) for typing a case expression, 
we ensure that the level of the result t must also be higher than 5: 
if the scrutinee changes, we may take the other branch, requiring a 
changeable result. 

Constraints and type inference. Our rules and constraints fall 
within the HM(X) framework |44|, permitting inference of principal 
types via constraint solving. Although our type system requires 
explicit labels for changeable levels, these labels can be inferred 
automatically. The user does not need to provide explicit labels when 
programming in the surface language. In all, we extend the type 
system with fine-grained dependency tracking without any burden 
on the programmer. 

5. Target Language 

Abstract syntax. The target language (Figure[9} is an imperative 
self-adjusting language with modifiables. In addition to integers, 
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Figure 9. Types and expressions in the target language 

units, products, sums, the target type system makes a distinction 
between fresh modifiable types □ int (modifiables that are freshly 
allocated) and finalized modifiable types int mod (modifiables that 
are written after the allocation). The function type t, — > contains 

an ordered set of destination types D, indicating the type of the 
destinations of the function. 

The variables consist of labels /, and ordinary variables y, which 
are drawn from different syntactically categories. The label variable 
/, is used as bindings for destinations. 

The values of the language consist of integers, variables, loca- 
tions £ (which appear only at runtime), pairs, tagged values, and 
functions. Each function fun^ /(x) = e takes an ordered label set 

which contains a set of destination modifiables /, that should 
be filled in before the function returns. An empty £, indicates the 
function returns all stable values, and therefore takes no destination. 

The expression apply" c (xi , x 2 ) applies a function while supplying 
a set of destination modifiables £.. The mod v construct creates a 
new fresh modifiable □ t with an initial value v. The read expression 
binds the contents of a modifiable x to a variable y and evaluates 
the body of the read. The write constructor imperatively updates 
a modifiable x, with value x 2 . The write operator can update both 
modifiables in destination labels £, and modifiables created by mod. 

Static semantics. The typing rules in Figure[lO]follow the struc- 
ture of the expressions. Rules (TLoc), (Tint), (TVar), (TPair), 
(TSum), (TFst), (TPrim) are standard. Given an initial value x of 
type t, rule (TAlloc) creates a fresh modifiable of type □ t. Note 
that the type system guarantees that this initial value x will never be 
read. The reason for providing the an initial value is to determine 
the type of the modifiable, and making the type system sound. Rule 
(TWrite) writes a value x 2 of type t into a modifiable X[ , when x t 
is a fresh modifiable of type □ t, and produces a new typing envi- 
ronment substituting the type of x\ into an finalized modifiable type 
t mod. Note that Rule (TWrite) only allows writing into a fresh 
modifiable, thus guarantees that each modifiable can be written only 
once. Intuitively, mod and write separates the process of creating 
a value in a purely functional language into two steps: the creation 
of location and initialization. This separation is critical for writing 
programs in destination passing style. Rule (TRead) enforces that 
the programmer can only read a modifiable when it has been already 
written, that is the type of the modifiable should be r mod. 

Rule (TLet) takes the produced new typing environment from 
the let binding, and uses it to check e 2 . This allows the type system 
to keep track of the effects of write in the let binding. To ensure the 
correct usage of self-adjusting constructs, rule (TCase) enforces a 
conservative restriction that both the result type and the produced 
typing environment for each branch should be the same. This means 
that each branch should write to the same set of modifiables. If a 
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Figure 10. Typing rules of the target language 

modifiable x is finalized in one branch, the other branch should also 
finalize the same modifiable. 

Rule (TFun) defines the typing requirement for a function: (1) 
the destination types D are fresh modifiables, and the argument 
type should not contains fresh modifiable. Intuitively, the function 
arguments are partitions into two parts: destinations and ordinary 
arguments; (2) the body of the function e has to finalize all the 
destination modifiables presented in £.. This requirement can be 
achieved by either explicitly write'ing into modifiables in or 
by passing these modifiables into another function that takes the 
responsibility to write an actual value to them. Although all the 
modifiables in £, should be finalized, other modifiables created 
inside the function body may be fresh, as long as there is no read of 
those modifiables in the function body. 

Rule (TApp) applies a function with fresh modifiables X. The 
type of these modifiables should be the same as the destination types 
D as presented in the function type. The typing rule produces a new 
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Figure 11. Translations ||r|| of types and typing environments 

typing environment that guarantees that all the supplied destination 
modifiables are finalized after the function application. 

Dynamic semantics. The dynamic semantics of our target lan- 
guage matches that of Acar et al |4| after two syntactical changes: 
fun £ f(x) = e is represented as fun f(x) = A£,.e, and appry x (xi , x 2 ) 
is represented as (x t x 2 ) £.. 

6. Translation 

This section gives a high-level overview of the translation from the 
source language to the target self-adjusting language. To ensure type 
safety, we translate types and expressions together using a type- 
directed translation. Since the source and the target languages have 
different type systems, an expression e : r cannot be translated to 
a target expression e' of type t, the type also has to be translated, 
producing some e' : r' where t' is a target type that corresponds 
to t. We therefore developed the translation of expressions and 
types together, along with the proof that the desired property holds. 
To understand how to translate expressions, it is helpful to first 
understand how we translate types. 

6.1 Translating types. 

Figure [TT|defines the translation of types from the source language's 
types into the target types. We also use it to translate the types in 
the typing environment T. We define ||r|| as the translation of types 
from the source language into the target types. We also use it for 
translating the types in the typing environment T. For integers, sums, 
and products with stable levels, we simply erase the level notation S, 
and apply the function recursively into the type structure. For arrow 
types, we need to derive the destination types. In the source typing, 
we fix the destination type labels by r 2 J, 0 D\ -£, where D stores the 
source type for the destinations. Therefore, the destination types for 
the target arrow function will be \\D\\. 

For source types with changeable levels, the target type will be 
modifiables. Since the source language is purely functional, the final 
result will always be a finalized modifiable r mod. Here, we define 
a stabilization function |r|" for changeable source types, which 
changes the outer level of r from changeable into stable. Formally, 
we define the function as, 

|t| S = t', where frj = C p , [t'] = S and r = r' 

Then, the target type for a changeable level source type r will be 
ll|r| S mod||. 

6.2 Translating Expressions 

We define the translation of expressions as a set of type-directed 
rules. Given (1) a derivation of C\P,T h e : t in the constraint- 
based typing system and (2) a satisfying assignment <p for C, it is 
always possible to produce a correctly-typed target expression e, 
(see Theorem |6. 1 | below). The environment T in the translation rules 
is a source-typing environment and must have no free level variables. 
Given an environment T from the constraint typing, we apply the 
satisfying assignment <p to eliminate its free level variables before 
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Figure 12. Translation for destination passing style 

using it in the translation [ip]T. With the environment closed, we 
need not refer to C. 

Our rules are nondeterministic, avoiding the need to "decorate" 
them with context-sensitive details. 

Direct rules. The rules (Int), (Var), (Pair), (Sum), (Fst) and (Prim) 
follows the structure of the expression, and directly translate the 
expressions. 

Changeable rules. The rules (Lift), (Mod), and (Write) translate 
expressions with outer level changeable C p . Given a translation 
of e to some pure expression e', rule (Write) translates e into an 
imperative write expression that writes e' into modifiable l p . 

For expressions with non-destination changeable levels, that is 
the label p has a 1 as the prefix, we need to create a modifiable 
first. Rules (Lift) and (Mod) achieves this goal. (Mod) is the simpler 
of the two: if e translates to e' at type t, then e translates to the 
mod expression at type r. To get an initial value for the modifiable, 
we define a function t| v that takes a source type t and returns any 
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Figure 13. Renaming the variable to be read 

value v of that type. Note that the initial value is only a placeholder, 
and will never be read, so the choice of the value is not important. 
In (Lift), the expression is translated not at the given type t but 
at its stabilized |r| J , capturing the "shallow subsumption" in the 
constraint typing rules (SLet): a bound expression of type tjJ can 
be translated at type to e', and then "promoted" to type r^ p by 
placing it inside a modifiable l p . 

Reading from changeable data. To use an expression of change- 
able type in a context where a stable value is needed — such as 
passing some x : int" to a function expecting int — the (Read) rule 
generates a target expression that reads the value out of x : int 1 " into 
a variable x' : int s . The variable-renaming judgment rhc-y(i» 
x' : t h e') takes the expression e, finds a variable x about to be 
used, and yields an expression e' with that occurrence replaced by 
x'. For example, T h case x of . . . ~> (x » x" : r I- case x' of . . .). 
This judgment is derivable only for variable, apply, case, fst, and 
©. For ©(xi , x 2 ), we need to read both variables; we omit the sym- 
metric rules for reading the second variable. The rules are given in 
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Figure 14. Deriving destination return 

Function and application rules. Since the self-adjusting primi- 
tives are imperative, an expression with outer changeable levels will 
be translated into a target expression that returns unit. To recover 
the type of the function return for the target language, we need 
to wrap the destinations, so that the function returns the correct 
type. Figure [14] shows the rules for translating the function body 
and wrapping the destinations. For a tuple expression (RPair), the 
translation returns the destination for each component. For a case 
expression (RCase), it is enough to return destinations from one 



of the branches since the source typing rule (SCase) guarantees 
that both branches will write to the same destinations. When the 
expression has a outer changeable level C p , rule (RMod) returns its 
modifiable variable l p . For let bindings, rule (RLet) translates all the 
bindings in the usual way and derive destinations for the expressions 
in the tail position. For all other expressions, the translation simply 
switches to the ordinary translation rules in Figure[T2] For example, 

expression (l,x) : (int a xint C( ") will be translated to ( 1 , / 01 ) by 
applying rules (RProd) (RTrans) (Int) (RMod). 

When applying functions apply(xi , x 2 ), rule (App) first creates 
a set of fresh modifiable destinations using mod , then supply both 
the destination set £, and argument x 2 to function x\. Note that 
although the destination names /; may overlap with the current 
function destination names, these variables are only locally scoped, 
the application of the function will return a new value, which 
contains the supplied destinations X, but they are never mentioned 
outside of the function application. 

The translation rules are guided only by local information — the 
structure of types and terms. This locality is key to simplifying the 
algorithm and the implementation but it often generates code with 
redundant operations. For example, the translation rules can generate 
expressions like read x as x' in write(/ p , x'), which is equivalent to 
x. We can easily apply rewriting rules to get rid of these redundant 
operations after the translation. 

Translation correctness. Given a constraint-based source typing 
derivation and assignment (f> for some term e, there are translations 
from e to (1) a target expression e, and (2) a destination return 
expression e r , with appropriate target types: 

Theorem 6.1. If C;f;T h e : t, and cp is a satisfying assignment 
for C, then 

(1) there exists e, and T' such that [^>]T h e : [</>]t c — > e t , and 

■AWWp^e, rlMk-ir', 

(2) there exists e r and T' such that [(f>]T h e : [tp\r e r , and 
■;W%*e r :\\T%^V. 

The proof is by induction on the height of the given derivation of 
C; V\ T h e : r. The proof relies on a substitution lemma for (SLet) 
case. We present the full proof in the appendix [ 14l . 

7. Probabilistic Chunking 

Precise dependency tracking saves space by eliminating redundant 
dependencies. But even then, the dependency metadata required can 
still be large, preventing scaling to large datasets. In this section, 
we show how to reduce the size of dependency metadata further by 
controlling the granularity of dependency tracking, crucially in a 
way that does not affect performance disproportionately. 

The basic idea is to track dependencies at the granularity of a 
block of items. This idea is straightforward to implement: simply 
place blocks of data into modifiables (e.g., store an array of integers 
as a block instead of just one number). As such, if any data in a 
block changes, the computation that depends on that block must be 
rerun. While this saves space, the key question for performance is 
therefore: how to chunk data into blocks without disproportionately 
affecting the update time! 

For fast updates, our chunking strategy must ensure that a small 
change to the input remains small and local, without affecting many 
other blocks. The simple strategy of chunking into fixed-size blocks 
does not work. To see why, consider the example in Figure [75] (left 
half), where a list containing numbers 1 through 16, missing 2, is 
chunked into equal-sized blocks of 4. The trouble begins when we 
insert 2 into the list between 1 and 3. With fixed-size chunking, all 
the blocks will change because the insertion shifts the position of all 
block boundaries by one. As a result, when tracking dependencies 
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Figure 15. Fixed-size chunking versus probabilistic chunking: with block size 6 = 4. Next to each data cell in the original list (top) is a 
unique identifier (location). The hash values of these identifiers (used in probabilistic chunking) are shown in the table, with values divisible by 
B = 4 marked with an arrow. 



at the level of blocks, we cannot reuse any prior computations and 
will essentially recompute the result anew. 

We propose a probabilistic chunking scheme (PCS), which 
decouples locations of block boundaries from the data contents 
and absolute positions in the list while allowing users to control 
the block size probabilistically. Using randomization, we are able 
to prevent small (even adversarial) changes from spreading to the 
rest of the computation. Similar probabilistic chunking schemes 
have been proposed in other work but differently, they aim at 
discovering similarities across pieces of data (see, e.g., [43 . 53] and 
the references therein) rather than creating independence between 
the data and how it is chunked as we do here. 

PCS takes a target block size B and determines block boundaries 
by hashing the location or the unique identifier of each data item and 
declaring it a block boundary if the hash is divisible by B. Figure [13] 
(right) illustrates how this works. Consider, again, a list holding 
numbers from 1 to 16, missing 2, with their location identifiers (a, 
b, ...) shown next to them. PCS chunks this into blocks of expected 
size B = 4 by applying a random hash function to each item. For 
this example, the hash values are given in a table on the right of the 
figure; hash values divisible by 4 are marked with an arrow. PCS 
declares block boundaries where the hash value is 0 mod 6 = 4, 
thereby selecting 1 in 4 elements to be on the boundary. This means 
finishing the blocks at 4, 9, and 1 1, as shown. 

To understand what happens when the input changes, consider 
inserting 2 (with location identifier p) between 1 and 3. Because the 
hash value of p is 13, it is not on the boundary. This is the common 
case as there is only a 1 / 6-th probability that a random hash value 
is divisible by 6. As a result, only the block [1,3,4], where 2 is 
added, is affected. If, however, 2 happened to be a boundary element, 
we would only have two new blocks (inserting 2 splits an existing 
block into two). Either way, the rest of the list remains unaffected, 
enabling computation that depended on other blocks to be reused. 
Deletion is symmetric. 

To conclude, by chunking a dataset into size-6 blocks, proba- 
bilistic chunking reduces the dependency metadata by a factor of 6 
in expectation. Furthermore, by keeping changes small and local, 
probabilistic chunking ensures maximum reuse of existing compu- 
tations. Change propagation works analogously to the non-block 
version, except that if a block changes, work on the whole block 
must be redone, thus often increasing the update time by 6 folds. 

8. Evaluation 

We performed extensive empirical evaluation on a range of bench- 
marks, including standard benchmarks from prior work, as well 
as new, more involved benchmarks on social network graphs. We 
report selected results in this section. All our experiments were per- 
formed on a 2GHz Intel Xeon with 1 TB memory running Linux. 
Our implementation is single-threaded and therefore uses only one 



core. The code was compiled with MLton version 20100608 with 
flags to measure maximum live memory usage. 

8.1 Benchmarks and Measurements 

We have completed an implementation of the target language as 
a Standard ML (SML) library. The implementation follows the 
formalism except for the following: (1) it treats both fresh and 
finalized modifiable types as a single t mod type; (2) for function 
fun £ f(x) = e, it includes destination labels as part of the function 
argument, so the function is represented as fun f (x) = fn £, => 
e. Accordingly, the arrow type (t ( — > r 9 ) is represented as r ( — > 

r' —> r,, where t' = t', mod x • • • x r' n mod andD = (t' p • • • ,t|J. 

Since our approach provides for an expressive language (any 
pure SML program can be made self-adjusting), we can implement 
a variety of domain-specific languages and algorithms. For the 
evaluation, we implemented the following: 

• a blocked list abstract data type that uses our probabilistic 
chunking algorithm (Section|7|, 

• a sparse matrix abstract data type, 

• as implementation of the MapReduce framework 1201 that uses 
the blocked lists, 

• several list operations and the merge sort algorithm, 

• more sophisticated algorithms on graphs, which use the sparse- 
matrix data type to represent graphs, where a row of the matrix 
represents a vertex in the compressed sparse row format, includ- 
ing only the nonzero entries. 

In our graph benchmarks, we control the space-time trade-off by 
treating a block of 100 nonzero elements as a single changeable unit. 
For the graphs used, this block size is quite natural, as it corresponds 
roughly to the average degree of a node (the degree ranges between 
20 and 200 depending on the graph). 

For each benchmark, we implemented a batch version — an op- 
timized implementation that operates on unchanging inputs — and 
a self-adjusting version by using techniques proposed in this paper. 
We compare these versions by considering a mix of synthetic and 
real-world data, and by considering different forms of changes rang- 
ing from small unit changes (e.g., insertion/deletion of one item) 
to aggregate changes consisting of many unit changes (e.g., inser- 
tion/deletion of 1000 items). We describe specific datasets employed 
and changes performed in the description of each experiment. 

8.2 Block Lists and Sorting 

Using our block list representation, we implemented batch and self- 
adjusting versions of several standard list primitives such as map, 
partition, and reduce as well as the merge sort algorithm msort. 
In the evaluation, all benchmarks operate on integers: map applies 
/((') = i-i-2 to each element; partition partitions its input based 
on the parity of each element; reduce computes the sum of the list 
modular 100; and msort implements merge sort. 
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Table [T] reports our measurements at fixed input sizes 10 7 . For 
each benchmark, we consider three different versions: (1) a batch 
version (written with the -batch suffix); (2) a self-adjusting version 
without the chunking scheme (the first row below batch); (3) the 
self-adjusting version with different block sizes (S = 3, 10, . . .). 
We report the block size used (B); the time to run from scratch 
(denoted by "Run") in seconds; the average time for a change 
propagation after one insertion/deletion from the input list (denoted 
by "Prop.") in milliseconds. Note that for batch versions, the 
propagation time (i.e., a rerun) is the same as a complete from- 
scratch run. We calculate the speedup as the ratio of the time for 
a run from-scratch to average propagation, i.e., the performance 
improvement obtained by the self-adjusting version with respect 
to the batch version of the same benchmark. "Memory" column 
shows the maximum memory footprint. The experiments show that 
as the block size increases, both the self-adjusting (from-scratch) 
run time and memory decreases, confirming that larger blocks 
generate fewer dependencies. As block size increases, time for 
change propagation does also, but in proportion with the block 
size. (From B = 3 to B = 10, propagation time decreases, because 
the benefit for processing more elements per block exceeds the 
overhead for accessing the blocks). 



Benchmark 


B 


iun (s) 


Prop, (ms) 


Speedup 


Vlemory 


map-batch 


1 


0.497 


497 


i 


344M 




1 


11.21 


0.001 


497000 


7G 




3 


16.86 


0.012 


41416 


10G 


map 


10 


5.726 


0.009 


55222 


3G 


100 


1.796 


0.048 


10354 


1479M 




1000 


1.370 


0.635 


783 


1192M 




10000 


1.347 


9.498 


52 


1168M 


partition-batch 


1 


0.557 


557 


1 


344M 




1 


10.42 


0.015 


37133 


8G 




3 


20.06 


0.033 


16878 


14G 


partition 


10 


6.736 


0.028 


19892 


3G 


100 


1.920 


0.049 


11367 


1508M 




1000 


1.420 


0.823 


677 


1159M 




10000 


1.417 


11.71 


47 


1124M 


reduce-batch 


1 


0.330 


330 


1 


344M 




1 


9.529 


0.064 


5156 


5G 




3 


13.39 


0.129 


2558 


6G 


reduce 


10 


4.230 


0.085 


3882 


1317M 


100 


0.990 


0.083 


3976 


592M 




1000 


0.627 


0.075 


4400 


420M 




10000 


0.593 


0.244 


1352 


327M 


msort-batch 


1 


12.82 


12820 


1 


1.3G 




1 


676.4 


0.956 


13410 


121G 




3 


725.0 


1.479 


8668 


157G 


msort 


10 


204.4 


1.012 


12668 


44G 


100 


52.00 


3.033 


4227 


10G 




1000 


43.80 


22.36 


573 


9G 




10000 


35.35 


119.7 


107 


8G 



Table 1. Blocked lists and sorting: time and space with varying 
block sizes on fixed input sizes of 10 7 . 

In terms of memory usage, the version without block lists (B = 1) 
requires 15-100x more memory than the batch version. Block lists 
significantly reduce the memory footprint. For example, with block 
size B = 100, the benchmarks require at most 7x more memory than 
the batch version, while still providing 4000-10000x speedup. In our 
experiments, we confirm that probabilistic chunking (Section^ is 
essential for performance — when using fixed-size chunking, merge 
sort does not yield noticeable improvements. 
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Figure 16. Run time (seconds) of incremental word count. 



Benchmark 


Source 


Input Size 


Prop, (s) Speedup Memory 


PR-Batch 


Orkut 


3 X 10 6 vertices 


7 


1 


3G 


PageRank 


1 X 10 s edges 


0.021 


333 


36G 


PR-Batch 


Live Journal- 1 


4 X 10 6 vertices 


18 


1 


5G 


PageRank 


3 X 10 7 edges 


0.023 


783 


61G 


PR-Batch 


Twitter- 1 


3 X 10 7 vertices 


137 


1 


50G 


PageRank 


7 X 10 s edges 


0.254 


539 


495G 


Conn-Batch 
Connectivity 


LiveJournal-2 


1 X 10 6 vertices 
8 X 10 6 edges 


105 
0.531 


1 

198 


4G 
140G 


SC-Batch 
Social Circle 


Twitter-2 


1 X 10 5 vertices 

2 X 10 6 edges 


8 

0.079 


1 

101 


2G 
34G 



Table 2. Incremental sparse graphs: time and space. 



8.3 Word Count 

A standard microbenchmark for big-data applications is word count, 
which maintains the frequency of each word in a document. Using 
our MapReduce library (run with block size 1 , 000), we implemented 
a batch version and a self-adjusting version of this benchmark, which 
can update the frequencies as the document changes over time. 

We use this benchmark to illustrate, in isolation, the impact 
of our precise dependency tracking mechanism. To this end, we 
implemented two versions of word count: one using prior art 1161 
(which contains redundant dependencies) and the other using the 
techniques presented in this paper. We use a publicly available 
Wikipedia datase^] and simulate evolution of the document by 
dividing it into blocks and incrementally adding these blocks to 
the existing text; the whole text has about 120, 000 words. 

Figure [16] shows the time to insert 1,000 words at a time into 
the existing corpus, where the horizontal axis shows the corpus 
size at the time of insertion. Note that the two curves differ only in 
whether the new precise dependency tracking is used. Overall, both 
incremental versions appear to have a logarithmic trend because 
in this case, both the shuffle and reduce phases require 0(logn) 
time for a single-entry update, where n is the number of input 
words. Importantly, with precise dependency tracking (PDT), the 
update time is around 6x faster than without. In terms of memory 
consumption, PDT is 2.4x more space efficient. Compared to a batch 
run, PDT is ~ lOOx faster for a corpus of size 100K words or larger 
(since we change 1000 words/update, this is essentially optimal). 

8.4 PageRank: Two Implementations 

Another important big data benchmark is the PageRank algorithm, 
which computes the page rank of a vertex (site) in a graph (network). 
This algorithm can be implemented in several ways. For example, 
a domain specific language such as MapReduce can be (and often 
is) used even though it is known that for this algorithm, the shuffle 
step required by MapReduce is not needed. We implemented the 



1 Wikipedia dataset: http : //wiki . dbpedia . org/ 
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PageRank algorithm in two ways: once using our MapReduce library 
and once using a direct implementation, which takes advantage of 
the expressive power of our framework. Both implementations use 
the same block size of 100 for the underlying block-list data type. 
The second implementation is an iterative algorithm, which performs 
sparse matrix- vector multiplication at each step, until convergence. 

In both implementations, we use floating-point numbers to 
represent PageRank values. Due to the imprecision in equality check 
for floating point numbers, we set three parameters to control the 
precision of our computation: 1) the iteration convergence threshold 
con £ ; 2) the equality threshold for page rank values eq £ , i.e. if a 
page rank value does not change for more than eq,,, we will not 
recompute the value; 3) the equality threshold for verifying the 
correctness of the result verify^. For all our experiments, we set 
con £ = 1 x 10~ 6 , and eq £ = 1 x 10~ 8 . For each change, we also 
perform a batch run to ensure the correctness of the result. All our 
experiments guarantee that verify £ < 1 x 10~ 5 . 

Our experiments with PageRank show that MapReduce based 
implementation does not scale for incremental computation, be- 
cause it requires massive amounts of memory, consuming 80GB of 
memory even for a small downsampled Twitter graph with 3 x 10 3 
vertices and 10 4 edges. After careful profiling, we found that this 
is due to the shuffle step performed by MapReduce, which is not 
needed for the PageRank algorithm. This is an example where a 
domain-specific approach such as MapReduce is too restrictive for 
an efficient implementation. 

Our second implementation, which uses the expressive power 
of functional programming, performs well. Compared to the 
MapReduce-based version, it requires 0.88GB memory on the 
same graph, nearly 100-fold less, and the update time is 50x faster 
on averagerlWe are thus able to use the second implementation on 
relatively large graphs. Table[2]shows a summary of our findings. 
For these experiments, we divide the edges into groups of 1,000 
edges starting with the first vertex and consider each of them in 
turn: for each group, we measure the time to complete the following 
steps: 1) delete all the edges from the group, 2) update the result, 3) 
reintroduce the edges, and 4) update the result. Since the average 
degree per vertex is approximately 100, each aggregate change 
affects approximately 10 vertices, which can then propagate to other 
vertices. (Since the vertices are ordered arbitrarily, this aggregate 
change can be viewed as inserting/deleting 10 arbitrarily chosen 
vertices). 

Our PageRank implementation delivers significant speedups at 
the cost of approximately lOx more memorywith different graphs 
including the datasets OrkuJ^] LiveJournaJ] and Twitter graprQ 
For example on the Twitter datasets (labeled Twitter- 1) with 30M 
vertices and 700M edges, our PageRank implementation reaches an 
average speedup of more than 500x compared to the batch version, 
at the cost of lOx more memory. Detailed measurements for the first 
100 groups, as shown in Figure [T7| left), show that for most trials, 
speedups usually approximate 4 orders of magnitude. 

8.5 Incremental graph connectivity 

Connectivity, which indicates the existence of a path between 
two vertices, is a central graph problem with many applications. 
Our incremental graph connectivity benchmark computes a label 
£(v) € Z + for every node v of an undirected graph such that two 
nodes u and v have the same label (i.e. i(u) = £(v)) if and only 



2 This performance gap increases with the input size, so this is quite a 
conservative nu mber. 

3 Orkut dataset: http : //snap . Stanford . edu/ data/com- Orkut . html 

4 LiveJournal dataset: 
: //snap. Stanford. edu/data/com- Live Journal .html 

5 Twitter dataset http : //an . kai st . ac . kr/ trac.es/WWW28 10 . html 



if u and v are connected. We use a randomized version of Kang 
et al.'s algorithm |36| that starts with random initial labels for 
improved incremental efficiency. The algorithm is iterative; in each 
iteration the label of each vertex is replaced with the minimum of its 
labels and those of its neighbors. We evaluate the efficiency of the 
algorithm under dynamic changes by for each vertex, deleting that 
vertex, updating the result, and reintroducing the vertex. We test the 
benchmark on an undirected graph from LiveJournal with 1M nodes 
and 8M edges. Our findings for 100 randomly selected vertices are 
shown in Figure [T7f center) ; cumulative (average) measurements are 
shown in|2] Since deleting a vertex can cause widespread changes in 
connectivity, affecting many vertices, we expect this benchmark to 
be significantly more expensive than PageRank. Indeed, each change 
is more expensive than in PageRank but we still obtain speedups of 
as much as 200x. 

8.6 Incremental social circles 

An important quantity in social networks is the size of the circle of 
influence of a member of the network. Using advances in streaming 
algorithms, our final benchmark estimates for each vertex v, the 
number of vertices reachable from v within 2 hops (i.e., how many 
friends and friends of friends a person has). Our implementation 
is similar to Kang et al.'s 1351 . which maintains for each node 10 
Flajolet-Martin sketches (each a 32-bit word). The technique can be 
naturally extended to compute the number of nodes reachable from 
a starting point within k hops (k > 2). To evaluate this benchmark, 
we use a down-sampled Twitter graph (Twitter-2) with 100K nodes 
and 2M edges. The experiment divides the edges into groups of 20 
edges and considers each of these groups in turn: for each group, we 
measure the time to complete the following steps: delete the edges 
from the group, update social-circle sizes, reintroduce the edges, 
and update the social-circle sizes. The findings for 100 groups are 
shown in Figure[T7|right); cumulative (average) measurements are 
shown in[2]in the last row. Our incremental version is approximately 
lOOx faster than batch for most trials. 

9. Related Work 

Incremental computation techniques have been extensively studied 
in several areas of computer science. Much of this research focuses 
on time efficiency rather than space efficiency. In addition, there 
is relatively little (if any) work on providing control over the 
space-time tradeoff fundamental to essentially any incremental- 
computation technique. We discussed closely related work in the 
introduction (Section[TJ. In this section, we present a brief overview 
of some of the more remotely related work. 

Algorithmic Solutions. Research in the algorithms community 
focuses primarily on devising dynamic algorithms or dynamic data 
structures for individual problems. There have been hundreds of 
papers with several excellent surveys reviewing the work (e.g., 1231 
1471 . Dynamic algorithms enable computing a desired property 
while allowing modifications to the input (e.g., inserting/deleting 
elements). These algorithms are often carefully designed to exploit 
problem-specific structures and are therefore highly efficient. But 
they can be quite complex and difficult to design, analyze, and 
implement even for problems that are simple in the batch model 
where no changes to data are allowed. While dynamic algorithms 
can, in principle, be used with large datasets, space consumption 
is a major problem 1221 . Bader et al. [48 1 present techniques for 
implementing certain dynamic graphs algorithms for large graphs. 

Language-Based Approaches. Motivated by the difficulty in de- 
signing and implementing ad hoc dynamic algorithms, the program- 
ming languages community works on developing general-purpose, 
language-based solutions to incremental computation. This research 
has lead to the development of many approaches 1 2 1 27 46 47 ], 
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Figure 17. (left) PageRank: 100 trials (x-axis) of deleting 1,000 edges; (center) Connectivity: 100 trials of deleting a vertex; (right) 
Approximate social-circle size: 100 trials of deleting 20 edges. Note: y-axis is in log-scale. 



including static dependency graphs 1211 . memoization 11461 . and 
partial evaluation [27|. Recent advances on self-adjusting computa- 
tion [TJ[6J builds on this prior work to offer techniques for efficient 
incremental computation expressed in a general-purpose purely 
functional and imperative languages. Variants of self-adjusting com- 
putation has been implemented in SML 1 1 1, Haskell [121 . C 1301 . 
and OCaml 1 3 1 1 . The techniques have been applied to a number 
of problems in a relatively diverse set of domains including mo- 
tion simulation |2, 5|, dynamic computational geometry |7 8|, and 
machine learning 131 1511 . 

In more recent work, researchers proposed improvements on the 
power of underlying self-adjusting computation techniques. Ham- 
mer et al proposed techniques for demand-driven self-adjusting 
computation, where updates may be delayed until they are de- 
manded 1 3 1 1 . Another line of research realized an interesting duality 
between incremental and parallel computation — both benefit from 
identifying independent computations — and proposed techniques for 
parallel self-adjusting computation. Some earlier work considered 
techniques for performing efficient parallel updates in the context 
of a lambda calculus extended with fork-join style parallelism |29|. 
Follow-up work considered the technique in the context of a more 
sophisticated problem showing both theoretical and empirical results 
of its effectiveness |7|. Burckhardt et al consider a more powerful 
language based on concurrent-revisions, provide techniques for par- 
allel change propagation for programs written in this language, and 
perform an experimental evaluation. Their evaluation shows rela- 
tively broad effectiveness in a challenging set of benchmarks 1111 . 

Systems. There are several systems for big data computations 
such as MapReduce (20), Dryad (32), Pregel (40), GraphLab t39l . 
and Dremel [41 1. While these systems allow for computing with 
large datasets, they are primarily aimed at supporting the batch 
model of computation, where data does not change, and consider 
domain-specific languages such as flat data-parallel algorithms and 
certain graph algorithms. 

Data flow systems like MapReduce and Dryad have been ex- 
tended with support for incremental computation. MapReduce On- 
line [18 1 can react efficiently to additional input records. Nectar 1 28 1 
caches the intermediate results of DryadLINQ programs and gener- 
ates programs that can re-use results from this cache. Prior work on 
Incoop applies the principles of self-adjusting computation to the 
big data setting but only in the context of MapReduce, a domain- 
specific language, by extending Hadoop to operate on dynamic 
datasets 1 10 1. In addition, Incoop supports an asymptotically subop- 
timal change-propagation algorithm. Naiad |42 | enables incremental 
computation on dynamic datasets in programs written with a specific 
set of data-flow primitives. In Naiad, dynamic updates cannot alter 
the dependency structure of the computation. Naiad is thus closely 
related to earlier work on incremental computation with static de- 
pendency graphs 12 II 1561 . Percolator |45| is Google's proprietary 
system that enables a more general programming model but requires 
programming in an event-based model with call-backs (notifica- 



tions), a very low level of abstraction. While domain specific, these 
systems can all run in parallel and on multiple machines. The work 
that we presented here assumes sequential computation. 

Functional Reactive Programming. More remotely related 
work includes functional reactive programming. Elliott and Hu- 
dak 1 26 1 introduced functional reactive programming (FRP) to 
provide primitives for operating on time-varying values. While 
greatly expressive, Elliott and Hudak's proposal turned out to be 
difficult to implement safely and efficiently, leading to much follow- 
up work on refinements such as real-time FRP |54|, event-driven 
FRP [55 1, arrowized FRP |38|, which restrict the set of acceptable 
FRP programs by using syntax and types to make possible efficient 
implementation. More recent approaches to FRP based on temporal 
logics include those of Sculthorpe and Nilsson |49|, Jeffrey |33|, 
Jeltsch 1 34 1, and Krishnaswami |37|. 

Much of the work on FRP can be viewed as a generalization of 
synchronous dataflow languages |9, 13 1 to handle richer computa- 
tions where the dataflow graph can accept certain changes between 
steps. One limitation of the synchronous approach to reactive pro- 
gramming is that one step cannot be started before the previous one 
finishes. This leads to a range of other practical difficulties, such as 
the choice of the right frequency (or step size) for updates I19II50I . 
Czaplicki and Chong propose techniques for asynchronous execu- 
tion that allow certain computations to span multiple time steps 1 19 1. 

While it appears likely that FRP programs would benefit from 
the efficiency improvements of incremental updates, much of the 
aforementioned work does not provide support for incremental 
updates. One exception is the recent work of Demetrescu et al, which 
provides the programmer with techniques for writing incremental 
update functions in (imperative) reactive programs 1241 . Another 
exception is Donham's Froc 1251 . which provides support for 
FRP based on a data-driven implementation using self-adjusting 
computation. 

10. Conclusion 

We present techniques for improving the scalability of automatic 
incrementalization techniques based on self-adjusting computation. 
These techniques enable expressing big-data applications in a func- 
tional language and rely on 1) an information-flow type systems and 
translation algorithm for tracking dependencies precisely, and 2) a 
probabilistic chunking technique for controlling the fundamental 
space-time trade-off that self-adjusting computation offers. Our re- 
sults are encouraging, leading to important improvements over prior 
work, and delivering significant speedups over batch computation 
at the cost of moderate space overheads. Our results also show that 
functional programming can be significantly more effective than 
domain-specific languages such as MapReduce. In future work, we 
plan to parallelize these techniques, which would enable scaling 
to larger problems that require multiple computers. Parallelization 
seems fundamentally feasible because functional programming is 
inherently compatible with parallel computing. 
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